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Abstract 

Probabilistic operational semantics for a nondeterministic extension of pure lambda calculus 
is studied. In this semantics, a term evaluates to a (finite or infinite) distribution of values. 
Small-step and big-step semantics are both inductively and coinductively defined. Moreover, 
small-step and big-step semantics are shown to produce identical outcomes, both in call-by- 
value and in call-by-name. Plotkin's CPS translation is extended to accommodate the choice 
operator and shown correct with respect to the operational semantics. Finally, the expressive 
power of the obtained system is studied: the calculus is shown to be sound and complete with 
respect to computable probability distributions. 

1 Introduction 

Randomized computation is central to several areas of theoretical computer science, including 
computational complexity, cryptography, analysis of computation dealing with uncertainty, incom- 
plete knowledge agent systems. Some works have been devoted also to the design and analysis of 
programming languages with stochastic aspects. For various reasons, the functional programming 
paradigm seems appropriate in this context, because of the very thin gap between the realm of 
programs and the underlying probability world. 

The large majority of the literature on probabilistic functional programming view probability 
as a monad, in the sense of Moggi (T21 fT5] . This is the case for the works by Plotkin and Jones 
about denotational semantics of probabilistic functional programs [10] , or for many of the recently 
proposed probabilistic functional programming languages: the stochastic lambda calculus [19] and 
the lambda calculi by Park, Pfenning and Thrun [151 116] . The monadic structure of probability 
distributions provides a good denotational model for the calculi and it makes evident how the 
mathematical foundations of probability can be applied directly in a natural way to the semantics 
of probabilistic programs. This allows, for example, to apply this approach to the formalization of 
properties of randomized algorithms in interactive theorem proving pQ. The monadic approach 
seems particularly appropriate in applications, since some programming languages, like Haskell, 
directly implement monads. 

But there is another, more direct, way to endow the lambda calculus with probabilistic choice, 
namely by enriching it with a binary choice operator ©. This way, we can form terms whose 
behavior is probabilistically chosen as the one of the first or of the second argument. It is not 
clear, however, whether the operational theory underlying ordinary, deterministic lambda calculus, 
extends smoothly to this new, probabilistic setting. The aim of this paper is precisely to start an 
investigation in this direction. The object of our study will be the nondeterministic lambda calculus, 
being it a minimal extension of the ordinary lambda calculus with a choice operator. The subject 
of our study, on the other hand, will be the properties of two notions of probabilistic semantics 
for it, namely call-by-value and call-by-name evaluation. Big-step and small-step probabilistic 
semantics will be defined and proved equivalent, both when defined inductively and when defined 
coinductively. CPS translations extending the ones in the literature are presented and proved 
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to have the usual properties enjoyed in the deterministic case. Finally, some results about the 
expressive power of the obtained calculus are proved. 

1.1 Related works 

The pioneering investigation in the fields of stochastic functional languages is the probabilistic LCF 
(PLCF in the following) proposed by Saheb-Djahromi in PLCF is a typed, higher-order calculus 
based on Milner's LCF, Plotkin's PCF [TS] and Plotkin's probabilistic domains (further developed 
by Plotkin and Jones in pU]). The syntax of PLCF, two kinds of abstractions are present which 
deal separately with call-by-value and call-by-name evaluation, respectively. The author declares 
an explicit intent in providing "a foundation for the probabilistic study of the computation" and 
even if a number of important aspects are unexplored, the approach is interesting and related to 
the present investigation. Saheb-Djahromi provides both denotational and operational semantics 
for PLCF. Denotational semantics, defined in terms of probabilistic domains, is an extension of 
Milner's and Plotkin's one. Operational semantics is a given as a Markov chain, and an equivalence 
result between the latter and a denotational model is stated and proved as an extension of Plotkin's 
results. 

In recent years, some lambda calculi with probabilistic features have been introduced, strongly 
oriented to applications (e.g. robotics). The most developed approach is definitely the monadic 
one [TSUI], based on the idea that probability distribution forms a monad ([8], [10]). In [19], 
Ramsey and Pfeffer introduce the stochastic lambda calculus, in which the denotation of expressions 
are distributions and in which the probability monad is shown to be useful in the evaluation of 
queries about the defined probabilistic model. A measure theory and a simple measure language are 
introduced and the stochastic lambda calculus is compiled into the measure language. Moreover, a 
denotational semantics based upon the monad of probability measures is defined. In [14] , Park 
define the typed calculus A 7 , an extension of the typed lambda calculus with an integral abstraction 
and a sampling construct, which bind probability and sampling variables respectively. A system of 
simple types for A 7 is also introduced. The author briefly discusses about the expressive power of 
A 7 : the calculus has been shown to be able to express the most relevant probability distributions. 
A 7 does not make use of monads, which are however present in |15| . in which the idea of a 
calculus based on the mathematical notion of a sampling function is further developed through 
the introduction of A . A is based on the monad of sampling functions and is able to specify 
probability distributions over infinite discrete domains and continuous domains. The authors also 
develop a new operational semantics, called horizontal operational semantics. The calculus A is 
further studied and developed in |16j . 

Nondeterminism and probability, however, can find their place in a completely different way. 
In [BJ, de'Liguoro and Piperno propose a non deterministic lambda calculus, called A®. A® is 
nothing more than the usual, untyped, A-calculus with an additional binary operator © which serves 
to represent binary, nondeterministic choice: M ® N rewrites to either M or N. The authors give 
a standardization theorem and an algebraic semantics for A®. The classical definition of a Bohm 
tree is extended to the non-deterministic case by means of inductively defined "approximating 
operators" . Several relevant properties such as discriminability are studied and, moreover, some 
suitable models for non-deterministic lambda calculus are proposed and discussed. In [7], Di Pierro, 
Henkin and Wiklicky propose an untyped lambda calculus with probabilistic choice. Its syntax is 
itself an extension of pure lambda calculus with n-ary probabilistic choice in the form (J)" =1 Pi ■ Mi. 
The main objective, however, is showing how probabilistic abstract interpretation can be exploited 
in the context of static analyis of probabilistic programs even in presence of higher-order functions. 
This is reflected by its operational semantics, which is more directed to program analysis (when two 
terms can be considered as equal?) than to computation (what is the value obtained by evaluating 
a program? how can we compute it?). 
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1.2 Outline 



After some motivating observations about the interplay between rewriting and nondeterministic 
choice (Section [2]), the calculus A ffi and its call- by- value probabilistic operational semantics are 
introduced (Section [I] and Section [5] respectively). Both small-step and big-step semantics are 
defined and their equivalence is proved in detail. Remarkably, the result holds even when operational 
semantics is formulated coinductively. The same results hold for call-by-name evaluation and are 
described in Section [6j Call-by- value and call- by-name can be shown to be able to simulate each 
other by slight modifications of the well-known CPS translations [T7], described in Section [7] The 
paper ends with a result about the expressive power of the calculus namely the equivalence between 
representable and computable distributions. 



2 Some Motivating Observations 

Lambda calculus can be seen both as an equational theory on lambda terms and as an abstract 
model of computation. In the first case, it is natural (and customary) to allow to apply equations 
(e.g. j3 or rj equivalences) at any position in the term. The obtained calculus enjoys confluence, in 
the form of the the so-called Church-Rosscr theorem: equality of terms remains transitive even if 
equations becomes rewriting rules, i.e. if they are given with an orientation. More computationally, 
on the other hand, the meaning of any lambda-term is the value it evaluates to in some strategy 
or machine. In this setting, abstractions are often considered as values, meaning that reduction 
cannot take place in the scope of a lambda abstraction. What's obtained this way is a calculus 
with weak reduction which is not confluent in the Church-Rosser sense. As an example, take the 
term {Xx.Xy.x){{Xz.z)(Xz.z)). In call-by-value, it reduces to Xy.(Xz.z). In call-by-name, it reduces 
to Xy.(Xz.z)(Xz.z). A beautiful operational theory has developed since Plotkin's pioneering's 
work [T7]. Call-by- value and call-by-name have been shown to be dual to each other (2J, and 
continuation-based translations allowing to simulate one style with the other have been designed 
and analyzed very carefully [3J H] . 

Now, suppose to endow the lambda calculus with nondeterministic sums. Suppose, in other 
words, to introduce a binary infix term operator © such that M @ N can act either as M or N 
in a nondeterministic flavor. What we obtain is the so-called nondeterministic lambda-calculus, 
which has been introduced and studied in We cannot hope to get any confluence results in this 
setting (at least if we stick to reduction as a binary relation on terms): a term like M = Mj © M±. 
(where My = Xx.Xy.x and M± = Xx.Xy.y are the usual representation of truth values in the 
lambda-calculus) reduces to two distinct values (which are different in a very strong sense!) in any 
strategy. The meaning of any lambda term, here, is a set of values accounting for all the possible 
outcomes. The meaning of M, as an example, is the set {M-\-,M±}. Nontermination, mixed with 
nondeterministic choice, implies that the meaning of terms can even be an infinite set of values. 
As an example, take the lambda term 

(Y(Xx.Xy.(y © x{M 8UCC y))))V 

where Y is a fixed-point combinator, V n represents the natural number n € N and M succ computes 
the successor. Evaluating it (e.g. in call-by-value) produces the infinite set 

{V ,Vi,V 2 ,...}. 

It's clear that ordinary ways to give an operational semantics to the lambda-calculus (e.g. a finitary, 
inductively defined, formal system) do not suffice here, since they intrinsically attribute a finitary 
meaning to terms. So, how can we define small-step and big-step semantics in a nondeterministic 
setting? 

Another problematic point is confluence. The situation is even worse than in ordinary, deter- 
ministic, lambda calculus. Take the following term [201 151] : 

M = (Xx.M xor xx)(M T © Mi) 
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where M xor = \x.\y.{x{\z.zM_\_M-j){\z.zM-\-M±))y is a term computing a parity function of 
the two bits in input. When reducing it call-by- value, we obtain the outcome {Mj_}, while 
reducing it call- by-name, we obtain {Mr, Mi}. This phenomenon is due to the interaction between 
nondeterministic choice and copying: in call-by-value we choose before copying, and the final result 
can only be of a certain form. In call-by-name, we copy before choosing, getting distinct outcomes. 
What happens to CPS translations in this setting? Is it still possible to define them? 

The aim of this paper is precisely to give answers to the questions above. Or, better, to give 
answers to their natural, quantitative generalizations obtained by considering © as an operator 
producing any of two possible outcomes with identical probability. 

3 A Brief Introduction to Coinduction 

Coinduction is a definitional principle dual to induction, which support a proof principle. It is 
a very useful instrument for reasoning about diverging computations and unbounded structures. 
Coinductive techniques are not yet as popular as inductive techniques; for this reason, in this 
section we give a short introduction to coinduction, following [11] . 

An inference system over a set U of judgments is a set of inference rules. An inference rule 
is a pair (A, c), where c £ IA is the conclusion of the rule and ACMis the set of its premises or 
antecedents. An inference system $ over U is a set of inference rules over IA. 

The usual way to give meaning to an inference system is to consider the fixed points of 
the associated inference operator. If $ is an inference system over Li, we define the operator 
F$ : p(U) -t p(U) as 

F<i, (A) = {ceU\3BCA, (B, c) e $}. 

In other words, F$(A) is the set of judgments that can be inferred in one step from the judgments 
in A by using the inference rules. A set A is said to be closed if F§(A) C A, and consistent if 
A C F$A. A closed set A is such that no new judgments can be inferred from A. A consistent set 
A is such that all judgments that cannot be inferred from A are not in A. The inference operator 
is monotone: F&(A) C F^>(B) if A C B. By Tarski's fixed point theorem for complete lattices, it 
follows that the inference operator possesses both a least fixed point and a greatest fixed point, 
which are the smallest closed set and the largest consistent set, respectively: 

lfp(F^^f]{A\F^(A)CA}; 
gjp(F*) = \J{A\ACF*(A)}. 

The least fixed point Ifp(F^) is the inductive interpretation of the inference system $, and the 
greatest fixed point gfp(F^) is its coinductive interpretation. 

These interpretations lead to the following two proof principles: 

• Induction principle: to prove that all judgments in the inductive interpretation belong to a 
set A, show that A is F$-closed. 

• Coinduction principle: to prove that all judgments in a set A belong to the coinductive 
interpretation, show that A is _F$-consistent. Indeed, if A is f$-consistent, then 

A c F 9 (A) => Ae{B\B c F*(B)} 

A C \J{B | B C F m {B)} = gfp{F m ). 

Example 1 (Finite and Infinite Trees) Let us consider all those (finite and infinite) trees 
whose nodes are labelled with a symbol from the alphabet {o,»} ; Example of finite trees are 

o(.,o) 

•(°( > , )) , ( ) , )>°) 
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while an example of an infinite tree is the (unique!) tree t such that t = o(t,t,t). Let U the set 
of judgments in the form t JJ., where t is a possibly infinite tree as above. Moreover, let $ be the 
inference system composed by all the instances of the two rules below: 



ti -IJ-, ■ ■ ■ , t n iy 



ti -IJ-, • ■ • , t n JJ. 



• 4 



•<*1, • ■ ■ ,t n ) 4 



■■■,t n )i} 



The inductive interpretation of $ contains all judgements t JJ. where t is finite tree, while its 
coinductive interpretation contains also all judgments t JJ. where t is infinite. 

4 Syntax and Preliminary Definitions 

In this section we introduce the syntax of A®, a language of lambda terms with binary choice 
introduced by de'Liguoro and Piperno [5]. This is the language whose probabilistic semantics is 
the main topic of this paper. 

The most important syntactic category is certainly those of terms. Actually, A^ is nothing 
more than the usual untyped and pure lambda calculus, endowed with a binary choice operator © 
which is meant to represent nondeterministic choice. 

Definition 1 (Terms) Let X = {x,y, . . .} be a denumerable set of variables. The set Aq of term 
expressions, or terms is the smallest set such that: 

1. if x £ X then x € A^; 

2. ifxeX and M € A®, then Xx.M € A©; 

3. if M, N e A© then MN € A ffi ; 

4. if M, N € A© then M © N g A ffi . 

Terms are ranged over by metavariables like M, N, L. 

Terms, as usual, are considered modulo renaming of bound variables. The set of free variables 
of a term M is indicated as FV(M) and is defined as usual. A term M is closed if FV(M) = 0. 
The (capture- avoiding) substitution of N for the free occurrences of x in M is denoted M{x/N}. 
Unless otherwise stated, all results in this paper hold only for programs, that is to say for closed 
terms. Values arc defined in a standard way: 

Definition 2 (Values) A term is a value if it is a variable or a lambda abstraction. We will call 
Val the set of all values. Values are ranged over by metavariables like V, W,X. 

The reduction relation — > considered in [B] is obtained by extending usual /3-reduction with two 
new reduction rules, namely M© N — > M and Mffi N — > N, which can be applied in every possible 
context. In this paper, following Plotkin |17| , we concentrate on weak reduction: computation can 
only take place in applicative (or choice) contexts. 

Notation 1 In the following, we sometimes need to work with finite sequences of terms. A sequence 
Mi, . . . , M n is denoted as M . This notation can be used to denote sequences obtained from other 
sequences and terms, e.g., M © N is M © iVi, . . . , M © N n whenever N is N±, . . . , N n . 

4.1 Distributions 

In the probabilistic semantics we will endow A ffi with, a program reduces not to a single value but 
rather to a distribution of possible observables, i.e. to a function assigning a probability to any 
value. This way, all possible outputs of all binary choices are taken into account, each with its own 
probability. Divergence is indeed a possibility in the untyped setting and our definition reflects it. 

Definition 3 (Distributions) 1. A probability distribution is a function 9) : Val — > R[o,i] such 

that XVeVai ^(V) — 1- denotes the set of all probability distributions. 
2. A proper probability distribution is a probability distribution such that XVeVal @{V) = 1. 
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3. Given a probability distribution 3l, its support S(S>) is the subset o/Val whose elements are 

values to which S> attributes positive probability. 
4- Given a probability distribution @, its sum is simply XVgVal ^(Y)- 

The notion of a probability distribution as we give it is general enough to capture the semantics of 
those terms which have some nonnull probability of diverging. We use the expression {V^ 1 , . . . , V£ k } 
to denote the probability distribution with finite support defined as @(V) = J2v=vPi- Pl ease 

observe that @ = J2i=iPi- 

Sometimes we need to compare distinct distributions. The natural way to do that is just by 
lifting the canonical order on R up to distributions, pointwise: 

Definition 4 9 < S iff &(V) < £(V) for every value V. 

The structure (V, <) is a partial order, but not a lattice [5]: the join of two distributions St and <§ 
does not necessarily exist in V . However, (V, <) is a complete meet-semilattice, since meets are 
always guaranteed to exist: for every ACT, the greatest lower bound of distributions in A is itself 
a distribution. Please observe, on the other hand, that all functions from Val to M.^ actually form 
a complete lattice. And that (") as a function from those functions to Roo is actually a complete 
lattice homomorphism. 

5 Call-by- Value 

In this section, four ways to give a probabilistic semantics to Aq are introduced, all of them 
following the so-called call-by-value passing discipline. 

A (weak) call- by- value notion of reduction can be obtained from ordinary reduction by restricting 
it in such a way that only values are passed to functions. Accordingly, choices are only made when 
both alternatives are themselves values: 

Definition 5 (Call- by- value Reduction) Leftmost reduction i— >- v is the least binary relation 
between A ffi and A^ such that: 

(Xx.M)V ^ v M{x/V} 

MN ^ v LN if M ^ v L 
VM h-> v VN if M ^ v N 

where V : W € Val. 

Please, notice that reduction is not probabilistic. In fact, reduction is a relation between terms and 
unlabeled sequences of terms without any reference to probability. Informally, if M i— > v N±. .. N n 
means that M rewrites in one step to every JVj with the same probability 1/n. Clearly, n G {1, 2} 
whenever M i-> v N±. .. N n . Notice again how the evaluation of both branches of a binary choice 
is done before performing the choice itself. One the one hand, this is very much in the style of 
call-by-value evaluation. On the other, a more standard notion of choice, which is performed before 
evaluating the brances can be easily encoded as follows: 

M + N= ({Xx.Xy.x) 8 (Xx.Xy.y))(Xz.M)(Xz.N)(Xw.w), 

where z does not appear free in M nor in N. 

5.1 CbV Small-step Semantics 

Following the general methodology described in [TT], we model separately convergence and 
divergence. Finite computations are, as usual, inductively defined, while divergence can be 
captured by interpreting a different set of rules coinductively. Both definitions give some quantitative 
information about the dynamics of any term M: either a distribution of possible outcomes is 



V © W h-> v v, w 

M (B N h> v L(B N ifM^yL 

V ® M H> v V <3)N ifM^ v N 
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associated to M, or the probability of divergence of M is derived. Both induction and coinduction 
can be used to characterize the distribution of values a term evaluates to. In an inductive 
characterization, one is allowed to underapproximate the target distribution, but then takes an 
upper bound of all the approximations. In a coinductive characterization, on the other hand, one 
can naturally overapproximates the distribution, then taking a lower bound. 



5.1.1 Inductive CbV Small-Step Semantics for Convergence 

A small-step semantics for convergence is captured here by way of a binary relation =>iv between 
terms in A s and distributions. The relation =>iv is defined as the inductive interpretation of the 
inference system whose rules are all possible instances of the following ones: 

M i-> v N Ni^w % 
smiv 



seiv r~ sv iv 1 

M =*,v V =s-, v {V 1 } M ^ |v y I 



i=l 



As usual, N stands for the sequence Nt,... N n . Since the relation =£-|v is inductively defined, any 
proof of judgments involving =>iv is a finite object. If 7r is such a proof for M =hv we write 
7r : M =>iv 3> ■ The proof 7r is said to be structurally smaller or equal to another proof p if the 
number of rule instances in 7r is smaller or equal to the number of rule instances in p. In this case, 
we write 7r X p. 

First of all, observe that =>iv is not a function: many different distributions can be put in 
correspondence with the same term M. Moreover, there is one distribution 9 such that M =>iv S> 
always holds, independently on M, namely 0. Actually, rule seiv allows you to "give up" while 
looking for a distribution for M and conclude that M is in relation with 0. In other words, =^iv is 
not meant to be a way to attribute one distribution to every term, but rather to find all finitary 
approximants of the (unique!) distribution we are looking for (see Definition [6]). 

The set of the distributions that a term evaluates to is a direct set: 



Lemma 1 For every term M, if M =>iv @ and M =>iv $ , then there exists a distribution & such 



that M ^>| V & with 9 > & and <§ > & '. 



Proof. By induction on the structure of derivations for M =>iv 9. 

• If M =hv then & = S ; 

• If M =j>iv {V 1 } we have that S = 0, or S = {V 1 }, then & = 9\ 

• If M =^iv Ni, . . . , Nk and N{ =hv 9% for % = 1, . . . , k, there are some cases: 

• If i = 0, then & = 

• If #)0, then $ = 2i=i where N{ =hv Si for i = 1, . . . , k. Now, by inductive hypothesis, 
there exist distributions #i such that Ni =>iv and <^ < for i = 1, . . . , k. We have 
& = E*=i l&u and by definition S> < & and S < 

This concludes the proof. □ 

We are now ready to define what the small-step semantics of any term is: 

Definition 6 The (call-by-value) small-step semantics of a lambda term M £ Aq is the distribution 
5iv(M) defined as sup M ^ |vS ,^. 

Please observe that such a distribution is always guaranteed to exist, precisely because of 
Lemma [TJ Indeed 



y ( sup s>\ = sup (y^s>)<i, 



since *2) is at most 1, by hypothesis. 

Example 2 The term (Xx.x)(Xx.x) evaluates to & = 0, by means of rules seiv, and to <§ 
{Xx.x 1 }, by means of rules smiv and sviv- By definition, S\y(M) — S . 
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5.1.2 Coinductive CbV Small-step Semantics for Divergence 



Divergence is captured by another, coinductively denned binary relation =>°° between A® and 
M[o.ij. Rules denning the underlying inference systems are those obtained from 

dv M^viV JV i= >~ 

uv « ===== dm v 



v =*0° M =>?$„ ! 

where JV is N%, . ..N n . 

The relation =>°° deals naturally with infinite computations, being coinductively defined. This 
allows to derive the divergence probability p of a term in A$. Rules dv v and dm v can be read as 
follows: values diverge with probability 0, while probability of divergence for a term M is equal to 
the normalized sum of its reducts' probabilities of divergence. The following example stresses a 
crucial point: 

Example 3 Let us consider the well-know diverging term 17 = AA, where A = Xx.xx. We would 
like to be able to prove that f2 =>J°, namely that Q diverges with probability 1. Doing that formally 
requires proving that some sets of judgments A including f2 is consistent with respect to =>°°. 
Actually we can choose A as {Q =^5"°}, since £1 =>f° can be easily derived from itself by rule dm v . 
The trouble is that in the same way we can derive =>^° for every possible < p < 1/ So, in a 
sense, =>°° is inconsistent. 

Example [3] shows that not all probabilities attributed to terms via =>°° are accurate. Actually, 
a good definition consists in taking the divergence probability of any term M, T>\\/(M) simply as 
sup M= ^oo p. As an example, 2?iv(£7) = 1, since M =>f and, clearly, one cannot go beyond 1. 

5.1.3 Coinductive CbV Small-Step Semantics for Convergence 

If one takes the inductive semantics from Section |5.1.1[ drops seiv and interpret everything 
coinductively, what came out is an alternative semantics for convergence: 



M h^ v N N z =>cv 

svcv 



smcv 



V=>a,{V 1 } Af=>cvV- 

Z 4 77 



n 
1=1 



Interpreting everything coinductively has the effect of allowing infinite computations to be modeled. 
But this allows to "promise" to reach a certain distribution without really being able to fullfill this: 



Example 4 Consider again f2. One would like to be able to prove that f2 =^cv 0- Unfortunately, 
the coinductive interpretation of the formal system above contains Q =>cv @ for every distribution 
, as can be easily verified 

The solution to the problem highlighted by the example above is just defining the coinductive 
semantics of any term M just as Scv(M) = infM=^ cv @ @- Clearly, Scv{ty = 0- 



5.2 CbV Big-Step Semantics 

An alternative style to give semantics to programming languages is the so-called big-step semantics. 
Big-step semantics is more compositional than small-step: the meaning of a term can be obtained 
from the meanings of its sub-terms. 

Probabilistic big step semantics for A ffi can be given by a binary relation JJ-cv between A^ and 
distributions. It is the coinductive interpretation of (all instances of) the following rules: 

M ^ C y N jj-cv & {P{x/V} 4cv &p,v}x x .Pes&),vesw ^ 

VUviV 1 } ^ MN ^cv ^(Arr.P) • S(V) ■ & Py 

\x.PeS($),veS(£) 
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M JJ-cv 9 N Jj CV S 
= i i = bs " 

M@N ^ --9 -^S + -^9 

The most interesting rule is definitely ba v : to give semantics to an application MN, we first give 
semantics to M and N, obtaining two distributions 9 and #, respectively. Then, for every Xx.P 
in S(9) and for every 1/ in S(#), we evaluate P{x/y}, obtaining some distributions -^p,v- The 
meaning of MJV is nothing more that the sum of all such distributions &py, each weighted by 
the probability of getting Xx.P and V in 9 and S, respectively. This way of defining the big-step 
semantics of applications can be made simpler, at the price of making the task of deriving semantic 
assertions harder, by replacing the premise {P{x/V} JJ-cv ^p,v} \x.PeS{@),V£S,{£ ) of rule ba v with 
{P{x/V} JJ-cv ^p,v}peA e ,vevai- 

Another interesting rule is bs v . Please observe how the distributions 9 and $ must be normalized 
by $ an d 9 (respectively) when computing the result. This reflects call- by- value evaluation: 
if any of M and N diverges, their sum must diverge, too. 

Like =>iv and =>cv, the relation JJ-cv is not a function: many possible distributions can be 
assigned to the same term M, in particular when M (possibly) diverges. In particular, distributions 
which somehow overapproximate the "real one" can always be attributed to M by the rules above. 
Thus, we define the big-step semantics of each lambda-term in the following way: 

Definition 7 The (call-by-value) coinductive big-step semantics of a lambda term M € is the 
distribution Bqv(M) defined as infjvf4 C v® 9. 

Example 5 Let us consider again CI, from Example^ As it can be verified, fl JJ-cv 9 for all 
possible distributions 9 '. To formally prove that, we need to find a consistent set A (with respect to 
JJ-cv ) containing all judgments in the form fl JJ-cv & ■ A is actually the set 

{A Jkv {A 1 }} U {CI Jlcv 9 I 9 : Val ->■ IR [0)1] }. 

Clearly, A is consistent, since any judgment in A can be obtained from other judgments in A in 
one deduction step: 

• A JJ-cv {A 1 } by rule bv v . 

• From A JJ-cv {A 1 } and CI JJ-cv 9, one can easily derive CI JJ-cv 9 by rule ba v . 
As a consequence, Bcv{Cl) = infaik v @ 9 = %. 

One may wonder whether an inductive big step semantics can be defined for A® . The answer is 
positive: one only needs to add a rule attributing the empty distribution to any terms, in the spirit 
of the small-step inductive semantics from Section |5.1.1| In other words, we obtain the system 

MJ|cv^ iVJ|iv^ {P{x/V} Jjiv & P , V } 

M Jkv {0} bV ' V MN ^ E 9[Xx.P) ■ §{V) ■ ~Jp~y 

\x.PeS(s>),veS(S) 

M JJ-iv 9 N Jjiv S 
r — 17 bV|v 1 1 bS|V 

As can be expected, the inductive big step semantics B\\/(M) of an term M is simply sup^^ |V ^. 
This is similar to the semantics considered by Jones [9]. 

5.3 Divergence and Convergence in CbV Small-step Semantics 

In the last two sections, various operational semantics for both convergence and divergence have 
been introduced. Clearly, one would like them to be essentially equivalent, i.e., one would like 
them to attribute the same meaning to programs. 

In this section, divergence and convergence small-step semantics will be compared and proved 
equivalent: the probability of divergence of M obtained through will be proved to be equal to 
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1 — ^<Siv(M). In Section 5.4 small-step and big-step semantics for convergence will be proved to 
produce identical outcomes. 

The first step consists in proving that 1 — J^<Siv(M) is a lower bound to divergence probability 
of any term M. 

Theorem 1 For every term M, M ^i^vSi V (m)' 

Proof. We can prove that all judgments M =^i_vsi V (.m~) belong to the coinductive interpretation 
of the underlying formal system $. To do that, we need to prove that the set A of all these 
judgements is consistent, i.e. that A C F$(A). This amounts to show that if c G A, then there is a 
derivation for c whose immediate premises are themselves in A. Let's distinguish two cases: 

• If M is a value V, then M and <S IV (M) = {V 1 }. The thesis easily follows. 

• If M is not a value, then M i— > v N, with N = Ni, . . . , N n . Now, consider the judgements 
Ni =^rLj2S N (N )' w ith i € [i-,n]' they are all in the set A. Finally, consider the judgement 
M =>^« at )' ^ * s m ^*(^) because of the presence of rule dm v : 

M^ V N N t =>% 

== dm v 

M 

It remains to show that ^<S|v(M) = Y^i=i h 2 ^iv(-^i) : 

V«S,v(M) = V sup = V sup (E-^)=E(E- su P 

\i=l / i=l 

This concludes the proof. □ 

We need something else, however, namely that summing some divergence probability and the 
convergence probability obtained through convergence semantics, we cannot go beyond 1: 

Proposition 1 If M ^ and M ^>| V @, then p + J2 ® < 1- 

Proof. The proof goes by induction on the structure of the derivation for M =>iv @: 

• If the only rule in the derivation is 



M =hv 

then @ = and ® — 0- As a consequence, p + J2^ = p<lby definition of the divergence 
relation. 

• If the only rule in the derivation is 



V^y {V 1 } 

then 2^=1 and M =>g°. As a consequence p + @ < 1- 
• If the derivation has the form 



1 

M =hv V 

n 

i=l 

then we apply the induction hypothesis on each Ni =>iv 9i for i € [l,n]. For each i <E [l,n] 
X) ^» — 1 — Pi , where pi is such that Ni =>-™ . Then we have: 

E*=EE^ = E~E*^E~a-ft) 

Z— 1 Z— 1 2=1 

L 

-Pi = 1 - V- 



1 

E^ 

i=l 
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This concludes the proof. 



□ 



Everything can be now glued together as follows, exploiting the density of real numbers: 
Corollary 1 For every M, V W (M) + £<S IV (M) = 1. 

Proof. Div(M) +X)S| V (M) > 1 by Theorem[l] Suppose, by way of contradiction, that 2?iv(M) + 
][]Siv(M) > 1. This implies M =>£° where p + J2 s \v( M ) > 1. This, in turn, implies that 
p + J2^> 1 f° r some M => !V 2). And this is not possible by Proposition [I] □ 

5.4 Relating the Various Definitions for Convergence 

Our goal in this section consists in proving that the four ways we have defined the semantics of 
a term in Aq (inductive and coinductive, big-step and small-step semantics) attribute identical 
meanings to any term. One possibility could be to proceed by "proving" three edges of the following 
diagram: 

S\m Scv 



The vertical edges relate two formulations given with identical "inductive flavors" but differing as 
to whether they are big-step or small-step. Conversely, horizontal edges put in correspondence 
two formulations which are both big-step or small step, but which differ as to which kind of 
interpretation is taken over the same set of rules. Horizontal edges are definitely more interesting 
to be proved, but vertical ones are important, too. In order to avoid tedious and long (but not 
necessarily informative) proofs, we only prove the diagonal edge shown below in this paper: 



S,v Bcv 

This proof shows the difficulties of both "vertical" and "horizontal" edges. 

Before embarking in the proof of this result, a brief explanation of the architecture of the proof 
is maybe useful. 

Consider any term M, and define two sets of distributions and M® v as, respectively, the 
sets of probability distributions which can be attributed to M in small-step semantics and big-step 
semantics, respectively: 

Mf v = {9\M ^}; 
M& = {£? | M 4cv 

We proceed in two steps: 

• First of all, we prove that big-step semantics dominates small-step semantics, namely that 
9 > £ whenever & £ M^ y and S e M^. This way, we are sure that Bcv(M) > S\\/(M). 
Details are in Section f5.4. II below. 

• Then, we prove that small step semantics can itself be derived using big-step semantics, namely 
that 5,v(M) e Mgj. This way, we immediately obtain that Bqv(M ) < Sw/(M), since Bq\/(M) 



is a minorizer of distributions in M^,. Details are in Section 5.4.2 
As a consequence, M| V = M|y. Figure U] illustrates the architecture of the proof. 
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Figure 1: The Overall Picture 
5.4.1 Big-step Dominates Small-step 

The fact any distribution obtained through big-step semantics is bigger than any distribution 
obtained through small-step semantics can be proved by induction on the structure of (finite!) 
derivations for the latter. Compulsory to that, however, is proving that whenever M =>iv S! and 
M is, say, a sum N © L, then appropriate judgments N =>iv £ and L => !V & can be derived. 
Similarly for applications. 

The following lemma formalizes these ideas and it is a technical tool for Proposition [2j 

Lemma 2 If M =>iv @, then at least one of the following conditions hold: 

1. <3 = $; 

2. M is a value V and 3> = {V 1 }; 

3. M is an application NL and there are (finite) distributions S and J? and for every Xx.P € S(<?) 
and V € S(J?) a distribution 'Spy such that: 

1. p:N ^>iv S, i : L =h v & and fi P y : P{x/V} =*>| V 'Spy; 

p, £, pp,v -< tt; 

4- M is a sum N © L and there are (finite) distributions § and & such that: 

1. p: N =hv S and£:L =h V & '; 

2. p,£,< it; 

3. 9< 2 - (E-^) + 2 (E^)- 

Proof. First of all, let us prove the following auxiliary lemma: 

Lemma 3 If I is finite and iti : M =hv f or every i € I, then there is p : M =>\y sxvp ieI 
Moreover p ^ 7Tj for some i € I . 

The proof goes by induction on the structure of the proofs in the family {7Tj}, e j (which can actually 
be done, since I is finite). 

Let's now go back to Lemma[2] This is an induction on derivations for M =>iv &. The cases 
where tt is obtained by the rules without premises are trivial. So, we can assume that it has the 

form 

M ^ v Q {ir t : Q t =>| V S>i\ie[iM 



m - iv > j ■:/, 

i=l 

where Q is Qi,... ,Q n . Let's distinguish some cases depending on how the premise M t-> v Q is 
derived 

• Suppose M = NL and that N i— > v i?. Then Q = RL. From the induction hypothesis applied 
to the derivations TTi : RiL =>iv we obtain that 
• Either all of the &i is 0. In this case & is itself 0, and a derivation for M =hv can be 
defined. 
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Or there is at least one among the derivation iii to which case [3] applies. Suppose that 
ki, . . . , k m are the indices in [1, n] to which case [3] can be applied. This means that for every 
in i in {1, ... , m}, there is a distribution dj, and for every Xx.P € S(<^) and V G S(^) 
a distribution M'pi yi such that: 



Rki =>IV S'i: 

P{V/a:} => IV 



Moreover, Pi,iiA % g .h ~< n t, and 



Xx.PeS(Si),veS(^i) 



Now, define # as the distribution 

m ^ 



4 



Note that S(S') = Ul=i S(<?i). Clearly, a derivation p for N =>iv can be defined such 
that p -< 7r: simply construct it from the derivations and some derivations for Rj =>iv 0. 
Moreover, a derivation £ of L => !V ^ where j£" = sup™ j ^ can be defined such that 
£ -< 7r: use Lemma[3] Observe that S(jF) = Ui=i S(=^i). Similarly, derivations ppy can 
be defined for every Xx.P € S(<f) and for every V € S(^) in such a way that ^ip,y -<! 7T, 
[ipy : P{V/x} ^>| V Sfpy and & P>V = sup™ x ^p>. Now: 



1 - 1 

77 ^ ' 71 



E 

* — ' n * — ' n 

1=1 i=l 



-E^( E %(Xx.P) ■ &i(V) ■ Jt% tV 

i=i \Xx.pes(ffi),ves{^i) 

<E^( E ^Az.P)-J?(V)-^ )V 



m 1 

^ E E-(^( A:c - p )-^( y )-^) 



n 

\x.PeS(£i),veS(&) i=l 

S'(Xx.P) ■ 3?(V) ■ & P>V 

Xx.PeS(S),veS(^) 



• Suppose M — NL and that L n- v R. Then Q = 7Vi?. This case is very similar to the previous 
one. Note that since we reduce in a call by value setting, then N € Val and by means of 
small-step semantics rules, N =^iv {N 1 }. 

• Suppose M = NL and that we are in presence of a redex, i.e. N is in the form Xx.R and L is a 
value and L =H V {L 1 }. Then Q is the unary sequence R{L/x}. The thesis easily follows by 
induction, taking ~W : R{L/x} =>iv @ as premise, where Q — sup R , L / x y- >lv g, h 

• Suppose M = N S3 L and that TV H> v i?. Then Q = R(B L. From the induction hypothesis 
applied to the derivations TTi : Ri L =>iv we have that: 
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• Either all of the distributions 9% is 0, then ^ is itself and a derivation for M =>iv can 
be defined. 

• Or there is at least one derivation among ir to which case [4] applies. Suppose that kx, . . . , k rn 
are the indices in [1, n] to which case [3] can be applied. For every in i in {1, ... , to}, there 
are two distributions Si and J^ 4 . Moreover, again by induction hypothesis, we have 

Pi ■ Rki =hv $i 



Moreover, p i} & -< iri and 9^ < § • • (J2 ^i) + § ' ^ ' (S Let us define # as the 
distribution anc ^ observe that a derivation p : N =>iv (o 5 such that p -< 7r can be 

defined from the derivation p, ;. Moreover, a derivation £ : L =>iv & can be defined such that 
£ -< 7r, taking J? = sup™^ J^i and applying Lemma [3j Finally we have: 



= E 

i=l 

m 1 



E£(r«-<E*))+E^-*-(E«>) 



i=l i=l 



<E^«-<E^)+E^--*-(E«)) 



n 2 v ^ " ^ri 2 

i=l i=l 



i=l «=1 

i=l t=l 

= r'-E* + r*-E* 



where for each i e [1, to], ^ < ^ & holds because ^ is the least upper bound of the 

• Suppose M = Z © L (Z € Val) and that L H> v Then Q = Zffi£ Similar to the previous 
case. 

• Suppose M = V © W and that M i— » v V,W. In this case the subderivations which we are 
looking for are ir : V =>iv 9 and p : W =>\\j S . 

This concludes the proof. □ 

Now, suppose that M ^>\y 9 and M JJ-cv $ ■ Lemma [2] provides all what is needed to "unfold" 
the hypothesis M =>iv 9 and obtain judgments matching exactly those coming from M JJ-cv & '. 
We easily get: 

Proposition 2 If M =S>| V ^ and M ^ C v #ien 9 < § . 

Proof. By induction on the structure of a proof for M =hv 9, applying Lemma [2] and doing some 
case analysis based on its outcome: 

• If 9 = 0, then 9 < S trivially. 

• If M — V and 9 — {V^ 1 }, then $ = 9, because the only rule for values in the big step semantic 
is 

===== bv v 
V 4cv {V} 
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• If M is an application NL and distributions & , 'S , Jtfpy arid derivations p, £, /ip,y exist as in 
Lemma [2j we can observe, by induction hypothesis, that there exist distributions and ^ 
such that N |kv J?, L |kv J where > & and J? > <S . Then we have that S(J r ) 2 S(J0 
and S( s /) 2 S(Sf). Now, for every Xx.P £ S(J0 and V £ S(£f ), suppose J£py is such that 
P{V/x} JJ-cv ^p.y- Again by induction hypothesis, we obtain that Jtfpy > ^p,v- Finally: 

9 < Yl &(Xx.P) ■ Sf (TO ' -#Hv 

Ax.P£S(^),ves(s?) 

< .^"(Ax.P) • §P(V) • Jfpy 

\x.PeS(j'),v<£S( c f) 

\x.PeS(y),veS(j<) 
= S. 

• If M is TV L, we can proceed exactly as in the previous case. In fact, there exist distributions 
J?,Sf and derivations p, £ as in Lemma [2j and by induction hypothesis we can observe that for 
any distributions J and J such that N !kv and L jkv ^ ■, and J? ><S hold. 
We can take such that S(«/) D S(^) and J{V) > &(V) for each V £ S(^), and we can 
take ^ such that S( ( /) D S(Sf) and /"(V) > &(V) for each T 7 £ S(Sf). Then we have: 

#<I.^.(£S,) + I.<y.(£y) 

= s. 

This concludes the proof. □ 



5.4.2 Small-Step is in Big-Step 

Whenever M =>iv ^ and M is not a value, one can always "decompose" 9 and find some judgments 
about the immediate subterms of M. This is Lemma [2j If we want to prove that the small-step 
semantics S\\/(M) of M can be itself attributed to M in the big-step case, we should somehow 
prove the converse, namely that judgments about the immediate subterms of M can be packaged 
into an analogous judgment for M. 

Lemma 4 Let M £ A® be any term. Then: 

1. If M is a value V and M =^ v 9, then 9 < {V 1 }. 

2. If M is an application NL, N =>iv $> L =>iv {P{V/x} =>iv &p,v} \x.PeS(g ),V£S(3?) then 
there exist a distribution 9 such that M =>iv 9 and 9 > Eax PeS(S) veS(^) £{^ X -P) ' ^{V) ■ 
9p,v. _ " " l ' 

3. If M is a sum N (&L, N =>iv S ', L =^| V & then there exist a distribution 9 such that M =>| V 9 

and9>\-£ • E ^ " + 5 • & " • E 

Proof. Let us prove the three statements separately: 

• If M is a value V, then the only possible judgments involving M are M =>iv {V 1 } and M =>iv 0. 
The thesis trivially holds. 

• If M is an application NL, then we prove statement [2] by induction on the derivations N =>iv $ 
and L =>iv & . Let's distinguish some cases: 

• If N =>, v 0, then £ £{Xx.P) ■ &(V) ■ 'Spy = 0- We derive M =h v by means of the rule 
seiv and the thesis holds. 

• If iV =>iv £ 7^ 0, suppose that the least rule applied in the derivation is 

N ^ v R Ri =h v Si 



N =hv > -£ 
t— 1 n 

i=l 
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Note that £ = X)"=i ^ is possible to apply the induction hypothesis on every RiL: 
if Ri =>\y S u L =h v V and P{V/x} =h v Sf P ,y (for Az.P e S(<%) C S(<^),F e S(jF)) 
follows that RiL =>iv ^ with ,yfi > J2\ x PeS(<?;) veS(^) Si(Xx .P)^ (V)^py . We are able 
to construct a derivation Ql in the following way: 

NL h^ v RL RiL =^ v J% 



NL 



E-Jifi 
n 



'IV , 

1=1 



And finally we have: 

9 = E \* ^ E £ f E ^(Ax.p)^(^ P , y 

i=l i=l \Aai.PeS(<f,),VeS(J?) , 

= EM E ^i(Aa:.P)^(V)Sfp,v 

i=i n \\x.PeS{g),veS(.^) 

Ax.PeS(cf),yeS(,^,) \»=i / 

53 «?(Aa;.P)^"(V)Sfp,v 
Ax.PeS(<f),yeS(,F) 

• Other cases follows can be handled similarly to the previous one. 

• M is a sum N © L. We prove the result by induction on the derivations TV =>iv $ and 
L =hv ^. 

• If TV =hv 0, then |-0-J]^+i-^-^0 = 0. We derive M =h v by means of the rule 
seiv and the thesis holds. 

• If N is a value V, the only interesting case is the one in which L is not a value and also 
L =>iv & with ^ (in fact if L =hv we are in the previous case and if L is a value 
the proof is trivial). Then M — V © L. Let us consider the derivation of the judgment 
L =>\\j the last rule applied in the derivation is 

L4 V I R t =hv &i 



n 

E 1 ^ 



1 n 



where J^" = X)"=i -<^i- Observe that M n- v We can apply the induction hypothesis 

to each L =>iv Y^i=i ano - to ^ i^ 1 }' an d we obtain 



2 ^— ' 2 

We are able to construct a derivation in the following way: 

VS)L^ V VS)R VS)R l =h v 



^— ' n 



n 

E 1 ^ 
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Finally we have: 

i=l v 7 i=l v 7 

\ 8=1 / 1=1 



• If iV =^>iv suppose that the least rule applied in the derivation is 



t=i 

Note that # = Y^i=i It is possible to apply the induction hypothesis on the single 
© L; then, if i?j ^>| V L ^>| V ^ we have that Ri © L =^| V with ^ > | • • + 
| • -^i " H We construct a derivation in the following way: 

N ® L H> v R(B L Ri © L =h V ^ 



■i=i 

Finally we have: 

i=l i=l v 7 

-t;(i-*-^) + ^sG-**^*) 

i=l v 7 i=l v 7 

1 n i 1 ™ i 

= E^.-I»+ E^.-X» 

i=l i=l 



• Other cases are trivial. 
This concludes the proof. □ 

Lemma[2]and Lemma [4] allows to prove that the 5iv(-) commutes well with the various constructs 
of A®. 

Lemma 5 For each term M € A®, then 

1. If M is a value V, then S\y(M) = {V 1 }; 

2. IfM is an application N L , then S\\/(M) — X/Aa;.PeS(5iv(A')),i / eS(5iv(£)) 

S N (N)(\x.P)-S,v(L)(V)- 

Siv(P{V/x}); 

3. IfM is asumN® L, then S N (M) = \ ■ S N (N) ■ £<S| V (£) + \ • S N (L) ■ £<S| V (iV). 
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Proof. We will use the following fact throughout the proof: 



Fact 1 If M N, then sup M=>|vS , 9 = sup^.^.^^. (£"=1 \%> % )- 

The inequalities above can be proved separately: 

• If M is a value, the thesis follows by small step semantics rules. Indeed, M =>iv 0. 

• For the other cases, the (<) direction follows from Lemma [2] and Fact[l] > direction follows 
from Lemma [4] and Fact [T] 

This concludes the proof. □ 

The fact S\\j(M) can be assigned to M in the big-step semantics is an easy consequence of 
Lemma [5j 

Proposition 3 M JJ-cv S\y(M). 

Proof. We will prove the thesis by coinduction: We can prove that all judgments M JJ-cv S\v(M) 
belong to the coinductive interpretation of the underlying formal system $ (in this case, the formal 
system is $ = {bv v , ba v , bs v }). To do that, we need to prove that the set A of all those judgments 
is consistent, i.e. that A C F<s>(A). This amounts to show that if c € A, then there is a derivation 
for c whose immediate premises are themselves in A. Let's distinguish some cases: 

• J£M = V then V Jjcv {V 1 } by bv v rule, and {V 1 } = S W (V) because of Lemma^ 

• If M is an application NL, take the judgment c\ = N JJ-cv S\y(N), C2 = L JJ-cv S\v(L) and the 
family of judgments {{P{V/x}} Jj C v S\v({P{V/x}})} X x.PeS(S lv (N)),VeS(S lv (L)y- we will prove 
that the judgment NL JJ-cv S\v(NL) can be derived in a single step from ci, C2 and those in 
the family above by means of ba v rule. Simply observe that 

N J|cv Siv(iV) L J|cv SjyjL) {{P{V/x}} J| C v S w ({P{V/x}})} Xx . PeS{SNiN)hVeS{SN{L)) 

NLUv Siv(N)(\x.P)-Siv(L)(V)-S w (P{V/x}) 

\x.PeS{Sw(N)),veS(Sw(L)) 

The thesis follows applying Lemma [5] 

• If M is a sum N ® L, take the judgment c± — N J| C v S\v(N) and c 2 = M JJ-cv S\y(L): we will 
prove that the judgment N © L JJ-cv S\v(NL) can be inferred in a single step from c\ and ci by 
means of bs v rules. Clearly, ci and ci belong to A. Moreover 

N Jkv <Siv(iV) LJ|cv5,v(L) 

bs v 



N © L J|cv \ ■ S W {N) ■ Sw{L) + \ ■ 5, v (i) • ^ S ™( N ) 

and by Lemma [5] case [3] we obtain the thesis. 
This concludes the proof. □ 

The equality between big-step and small-step semantics is a corollary of Proposition [2] and 
Proposition [3j 

Theorem 2 B C v(M) =S\ V (M). 



6 Call-by-Name 

In Section [5] we endowed A® with a call-by-value probabilistic operational semantics and showed 
that the distribution assigned to any term M is the same in big-step and in small-step semantics, 
independently on whether they are defined inductively or coinductively. Actually, the same holds 
in call-by-name: both big-step and small-step semantics can be defined (co)inducively and proved 
equivalent, following the same path used in call- by- value. In this section, we briefly sketch how 
this can be done, by defining Bc^(M) and S\u(M) and by proving they are equivalent. 
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Definition 8 (Call-by-name Reduction) Leftmost reduction H> n is the least binary relation 
between A ffi and such that: 

(Xx.M)N ^ n M{x/N} MN h-> n LN if M ^ n L 

M ® N i-^-n M, N 

Note that, contrarily to call- by- value, in call-by-name it is possible to perform a choice between 
terms which are not values. 



6.1 Small-step Semantics 

As for call-by-name, we model separately terminating and non-terminating computations. The rule 
schema is the same, up to the different reduction relation i— > n . First of all, there is an inductively 
defined binary relation =>in between A^ and distributions. Rules are as follows: 

M ^ n N Ni =h N % 

sm n 



se n - — — sv n n 



■i=i 



Sm(M) is the distribution sup M= ^ N ^ '3. Moreover, there is also another, coinductively defined 
binary relation between A^ and R[o,i] capturing divergence. Rules are as follows: 

M^ n N {Nj^} ie[hn] 



DV n ======= dm 



T> n (M) is nothing more than sup M=>0 o p. Notice how the differences between call-by- value and 
call-by-name small-step semantics all come from the reduction relation, since the rules above are 
analogous to their call-by-value siblings. 

As done in Section 5.1.3 for call-by-name, a conductive version of call- by-name small step 
semantics can be easily defined. 



6.2 Big-step Semantics. 

We define call-by-name big-step semantics of terms in A ffi as the co-inductive interpretation of 
a suitable set of rules. Again, this allow us to capture infinite computations. A coinductively 
defined binary relation J,cn between A® and distributions is obtained by taking all instances of the 
following rules: 

M |cn {P{N/x} 4, CN g P , N }\x.Pesm M | C n 2> N | C n $ 

Xx.PdS(3!) ^ N 2 2 

Bcn(M) is simply the subdistribution infjvfj, CN @ ^ ■ The way binary choices are managed reflects 
the reduction rules, which allow to evaluate a binary choice to one of its components even if the 
latter are not values. Indeed, while in call-by- value normalization factors 2i and ^ S were 
necessary, they are not here anymore. 

Example 6 Consider the term M = Q ffi (Ax. a;). Recall from Example^ that fl JJ-cv @ for every 
3. Moreover, Xx.x JJ-cv {(Xx.x) 1 }. This implies M JJ-cv 0; and, as a consequence, that Bcv(M) = 0. 
The same behavior cannot be mimicked in call-by-name. Indeed, while f2 4-cn @ (for every S!) and 
Xx.x 4_cn {(^x.x) 1 }, the smallest distribution which can be assigned to M is Bcn(M) = {(Xx.x)^}. 

Inductive call-by-name big step semantics can be obtained adding the rule which assigns the 
empty distribution to any term and taking the inductive interpretation of the system. 
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6.3 Comparing the Different Notions 

We retrace here results of Section [53] and Section [574] for call-by-name semantics. 



6.3.1 Divergence and Convergence in CbN Small-step Semantics 

Convergence and divergence call-by-name small step semantics are proved equivalent by means of 
the following results. 

Proposition 4 If M ^2° and M ^>, N 9, then p + Y,® < 1 - 



6.3.2 Small-step vs. Big-step 

We prove here the equivalence between inductive call-by-name small step semantics S\n(M) 
and coinductive call-by-name big step semantics Bqn{M). We use the same proof technique of 
Section 15.31 

Given any term M, the sets of distributions sets of probability distributions which can be 
attributed to M in call-by-name small-step semantics and call-by-name big-step semantics, are 
respectively defined as = {& \ M =h N 3>} and Mf N = {& \ M | C N Firstly we 

prove that big-step semantics dominates small-step semantics (Proposition [5]), and this imply 
that Bcn(M) > S|n(M). Successively, we prove again that small step semantics can itself be 
derived using big-step semantics, namely that S\n(M) G (Proposition [6]). As a consequence, 
Bcn(M) < 5| N (M) and finally Mjg = M^ N . 

Lemma 6 If I is finite and -Ki : M =>in ^ for every i € I , then there is p : M =>in sup ig/ 
Moreover p -< 7r.; for some i € I . 

Proof. The proof goes by induction on the structure of the proofs in the family {~Ki}i£i (which 
can actually be done, since / is finite). □ 

Lemma 7 If M =>in @, then at least one of the following conditions hold: 

1. 3 = 0; 

2. M is a value V and 2> = {V 1 }; 

3. M is an application NL and there is a (finite) distribution S and for every Xx.P € S(S') a 
distribution Sfp.z, such that: 

1. p: N =>| N S and p PtL : P{x/L} =^ N & PtL ; 

2. p, hp iL -< 7r; 



3- ^<Ex,.PES(^^M'%' 
4- M is a sum N © L and there are (finite) distributions <§ and & such that: 

1. p: N =h N £ and £ : L => iN ,9; 

2. p,i< it; 

3. @ < \ ■£ -+\ 



Proof. This is an induction on derivations for M =>iv Si- The cases where 7r is obtained by the 
rules without premises are trivial. So, we can assume that 7r has the form 



where Q is Qi, . . . , Q n . Let's distinguish some cases depending on how the premise M H> v Q is 
derived 




belong to the 



□ 



□ 



smiv 
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• Suppose M = NL and that N \— > v R. Then Q = RL. From the induction hypothesis applied 
to the derivations TTi : RiL =>in we obtain that 

• Either all of the ^ is 0. In this case @ is itself 0, and a derivation for M =Hn can be 
defined. 

• Or there is at least one among the derivation tti to which case [3] applies. Suppose that 
fci, . . . , k m are the indices in [1, n] to which case [3] can be applied. This means that for every 
in i in {1, . . . , to}, there is a distribution <^-, and for every Xx.P £ S(^) a distribution M'pi 
such that: 

Pi '■ Rki =>\v 
ftp : P{L/x} =h v Jt£. 



Moreover, pi, ft] 1 h ~< ir t , and 



< ^(p)-^p 

\x.PeS(gi) 



Now, define <§ as the distribution 

Z — ^ n 



n 
i=l 



Note that S(<?) = |J£Li S(<?i). Clearly, a derivation p for TV =>| V ^ can be defined such 
that p -< tt: simply construct it from the derivations pi and some derivations for Rj =^| V 0- 
Moreover, a derivations pp can be defined for every Xx.P £ S(#) in such a way that /xp -< tt 
(use Lemma|6]), : P{L/x} =Hv ^p and Sf P = sup™ : J«£. Now: 



J2 Si{\x.P) ■ JT P 

,\x.PGS(£i] 




£i{Xx.P)-Jt£ 

. Xx.PeS(s), 



n 

\x.PeS(Si) i=i 

= £( Xx - p ) ■ ^p- 

Xx.PeS(S) 

• Suppose M = NL and that we are in presence of a redex, i.e. N is in the form Xx.R. Then S 
is simply {Xx.R 1 }. From R{L/x} =Hv ^, the thesis easily follows. 

• Suppose M = N ® L and that M H> n N,L. In this case the subderivations which we are looking 
for are tt : N =>| N @ and p : L => !N S . 

This concludes the proof. □ 

Proposition 5 If M =S>| N <3? and M | C n toen < i. 

Proof. By induction on the structure of a proof for M =>in @, applying Lemma[7jand doing some 
case analysis based on its outcome: 
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• If @ = 0, then S < S trivially. 

• If M = V and 3 = {T^ 1 }, then S = @, because the only rule for values in the big step semantic 
is 

f ^ bv n 
VIcn {V} 

• If M is an application NL and distributions &,<ffipi, and derivations p,fipi, exist as in 
Lemma [7J we can observe, by induction hypothesis, that there exists a distribution y such that 
N |cn where J > ,9. Then we have that S{jf) D S(^). Now, for every Xx.P G S(^), 
suppose Jfp,L is such that P{L/a;} 4-cn ^p,l- Again by induction hypothesis, we obtain that 
J£p,L > J4°p,l- Finally: 

9 < ^(Ax.P) • JZp tL 

\x.P£S(&) 

< ^ y(Xx.p) ■ x PtL 

\x.PeS(.s<) 



• If M is TV L, we can proceed exactly as for Proposition [2j sum case. 

This concludes the proof. □ 

Lemma 8 Let M G be any term. Then: 

1. If M is a value V and M => lN 9., then 9 < {V 1 }. 

2. If M is an application NL, N =>in $ , {P{L/x} =>in @p}\ x .P£S(<g) then there exists a distribu- 
tion 3 such that M =^| N 9> and 3 > J2 Xx PeS ( (f) S(Xx.P) ■ <S P . 

3. If M is a sum N®L, N =>in § , L =>in & then there exists a distribution 3) such that M 3 
and3>>\-g+\-&. 

Proof. Let us prove the three statements separately: 

• If M is a value V , then the only possible judgments involving M are M =>in {V 1 } and M =>in 0. 
The thesis trivially holds. 

• If M is an application NL, then we prove statement [2] by induction on the derivations N =>in $ 
and L =>in & . Let's distinguish some cases: 

• UN =h N 0, then £ S(Xx.P) ■ = 0. We derive M =^ lv by means of the rule se n and 
the thesis holds. 

• If iV =>in $ 7^ 0, suppose that the least rule applied in the derivation is 

N^ V R Ri^mSi 

sm„ 



n 

N 



1 

in y J 

' n 

i=l 

Note that S — Yli—i It is possible to apply the induction hypothesis on every i?,L: if 
Ri =h N S h and P{L/x} =h N $ P (for Xx.P G S(^) C S{S)) follows that R t L =h N with 
J^i > ^ A2 . p e s(Si) Si(Xx.P) ■ 'Sp. We are able to construct a derivation 3 in the following 
way: 

NL ^ v RL R t L =h N 

sm n 



" 1 

nl =^ IN y-^ 



> IN 

I — 1 
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And finally we have: 



n n 1 / \ 

E-^E- E *(**j>)-9p) 

i=l i=l \Aa:.P£S(£i) / 

E^( E ^(A^)-^p) 

i=l \Aa;.PeS(^) / 

E E^^m-^) 

PeS(<?) \i=l / 



A K .PeS(<?) 

= ^ «f (Ax.P) ■ Sfp. 
Ax.PeS(<?) 

• Other cases can be handled similarly to the previous one. 
M is a sum N © L. The proof is trivial. 



□ 



Lemma 9 For each term M £ A®, then 

1. If M is a value V, then S\ N (M) = {V 1 }; 

2. If M is an application NL, then S\m(M) = J2\x PeS(5, N (A r )) S\n(N)(Xx.P) ■ Sm(P{L/x}) ; 

3. If M is a sum N Q) L, then S m (M) = ± • S lN (N) + \ ■ S m {L). 

Proof. The inequalities above can be proved separately: 

• If M is a value, the thesis follows by small step semantics rules. Indeed, M =>in 0. 

• For the other cases, the (<) direction follows from Lemma [7] and Fact[l] > direction follows 
from Lemma [H] and Fact [l] 

□ 

Proposition 6 M Icn S lN (M). 

Proof. We will prove the thesis by coinduction: We can prove that all judgments M J,cn S\y(M) 
belong to the coinductive interpretation of the underlying formal system $ (in this case, the formal 
system is $ = {bv n , ba n , bs n }). To do that, we need to prove that the set A of all those judgment is 
consistent, i.e. that A C F${A). This amounts to show that if c £ A, then there is a derivation for 
c whose immediate premises are themselves in A. Let's distinguish some cases: 

• If M = V then V | C n {V 1 } by bv v rule, and {V 1 } = S m (V) because of Lemma[9| 

• If M is an application NL, take the judgment ci = N Icn S\n(N), and the family of judgments 
{{P{L/x}} |cn S\ N ({P{L/x}})}xx.PeS(s IN (N))- we will prove that the judgment NL | C n 
S\n(NL) can be derived in a single step from ci and those in the family above by means of ba n 
rule. Simply observe that 

N |cn Sin (AO {{P{L/x}} | C n S^({P{L/x}})} Xx 

x ba v 

NLi CN Yl S m (N)(\x.P)-S m (P{L/x}) 

\x.PeS(s m (N)) 

The thesis follows applying Lemma [9] 

• If M is a sum N (B L, take the judgment c\ = N |cn S\n(N) and C2 = M |cn S\n{L): we will 
prove that the judgment TV © L |cn S\n{NL) can be inferred in a single step from c% and C2 by 
means of bs v rules. Clearly, c\ and C2 belong to A. Moreover 

NIcnS in (N) LIcnS\n(L) 

bs n 



N®L Icn \ -S w {N) + ^-S m {L) 



and by Lemma |9j case [3] we obtain the thesis. 
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This concludes the proof. □ 
Theorem 4 B CN (M) =S\ N {M). 

Proof. This is a corollary of Proposition [6] and Proposition [5] □ 

7 CPS Translations and Simulations 

In this section we show that in A® it is possible to simulate call-by-value by call-by-name and 
vice versa. To do that, we follow Plotkin's CPS translation [T7], extended to accommodate binary 
choice. 

It is well known that in the weak untyped lambda calculus, call-by-value and call-by-name 
are not equivalent notions of reduction. Moreover, the presence of binary choice exacerbates the 
confluence problem, as shown in Section [2] It is clear that duplications play a central role: in 
particular, the order in which probabilistic choices and duplications are performed matters. 

The presence of binary choices opens a related question about the possibility of including some 
"administrative rules" distributing sums over the other constructs. The following example shows 
that it may be critical to include administrative rules: 

Example 7 Let us consider the following terms: M = \x.x(M-\- © M±) and N = (Xx.xM-j) © 
(Xx.xM±). Note that M can be obtained from N by means of administrative rules like L(P © Q) = 
(LP) © (LQ). Consider the term N xor = (Xz.M xor zz). We have S\y(MN xor ) = {A/|} = @ and 

S\ N (MN xor ) = {Ml ,M-f } = S, whereas S\ V (NN xor ) = S\ N (NN xor ) = Q> and therefore M and N 
have different call-by-name observational behaviors, thus M and N can not be considered equivalent 
terms. 

A study about the observational behavior of terms is a fascinating subject, but is out of the scope 
of this paper. By the way, this is extensively investigated in the non deterministic setting [6], in 
which an algebraic semantics of terms is defined by way of a generalization of Bohm's trees. A 
further generalization to the probabilistic setting is left to future work. 

What we are interested here is to develop an operational study of A®. And an interesting 
question is clearly whether call-by-value and call-by-name, although being distinct notions of 
reduction, can be somehow made equivalent through a suitable CPS translation, even in presence 
of binary choices (with a probabilistic semantics) . 

In this section, a simulation between call-by-value and call-by-name in A® is proved. We begin 



with the simulation of call-by- value by call- by-name (Section 7.1), then we will carry on with the 



simulation of call-by-name by call-by-value (Section 7.2) 



7.1 Simulating Call-by- Value with Call-By-Name 

Suppose to extend A® with a denumerable set of continuation variables C = {a, (3, e . . .}, disjoint 
from the original set X of variables of the language. We will call A^ the language of lambda terms 
extended this way. All definitions and constructions on A® (including its operational semantics) 
extend smoothly to A~t 

Call-by- value reduction on A® can be simulated by call-by-name reduction of Ai by translating 
every term M in A® to a term in Ai, which will be proved to be equivalent to M in a certain 
sense. 



Definition 9 (Call-by-value Translation) The translation map [[ • ]] from A® to A 
sively defined as follows: 



+ 



is recur- 



M = A "*; P^T = Ae.(pf!(A a .pVl(A/?.a/3e))); 

ftXx.M^i = Xe.eXx.\\Mf, [fM © ATf] = Ae.pfH(Aa.pVT](Aj0.((A7.7a) © (A 7 . 7 /3))e)). 
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As expected, call-by-value is simulated by way of so-called continuations. As a consequence, 
(call-by- value) reduction on M is not simulated simply reducing (in call-by-name), [fAffl, but 
by feeding it with the identity continuation Xx.x. Furthermore, we do not obtain this way the 
same value(s) as the one(s) we would obtain by evaluating M, but something related to that by a 
function which sends values in Aq into values in AX as follows: 

*(x) = x; *(Ax.M) = Ax.[[M]l. 

Clearly, can be naturally extended to a map on distributions (of values) . 

The rest of this section is devoted to showing the following theorem, which retraces in A e the 
classic result from [17] : 

Theorem 5 (Simulation) For every M, ^{S\\j{M)) = S\ N {^MJ{Xx.x)). 

Let us get back to the proof. The following fact and the following substitution lemma are 
necessary. 

Fact 2 (M © N){V/x} = M{V/x} © N{V/x}. 



Lemma 10 extends Plotkin's substitution lemma (see [TTJ, Lemma 1, page 149) 
Lemma 10 (Substitution) pf[K$(V)/x} = [fM{V/a;}"[l . 

Proof. By induction on the size of M. An interesting case: 
• If M is N © L, then 

pkfKWM = Ae.p\l(Ac*.pl(A/3.(((A 7 . 7 a) © (^-7PM)MV)M 

= \em{y(V)/x}(\am{V(V)/x}(\p.({{\ r7 a) © (A 7 . 7 /3))e)) 
= \e.$N{*(V)/xm\a.$L{y(V)/xm\p.({{\ r7 a) © (A 7 . 7 /3))e)) 
= §N{V(V)/x} © L{V(V)/x}J = ftN®L{*(V)/x}]. 

Other cases are very similar to the previous one. This concludes the proof. □ 

We define now the suitable extension of the infix operator ":" introduced by Plotkin in [17] , 

Definition 10 (Infix operator ":" for ([■]]) The infix operator ":" for the map [[•]] is defined 
as follows: 

KV(V) 

L : (\a.\\P~]\(\f3.af3K)) if L $ Val 
L : ((\0.9(V)0K)) if Li Val 
V(V)V(W)K 

L : (AalPl(A/3.(((A 7 . 7 a) © (A 7 . 7 /3)) K))) if L £ Val 
L : (A^.(((A 7 . 7 *(F)) © (A 7 . 7 /3))A)) if L £ Val 
((A 7 . 7 *(F)) © (Aj.<yV (W)))K 

The operator ":" can be naturally extended to distributions of values. In this case we will use the 
notation : K. 

The following lemmas exploit operator ":" and give an important intermediate results in order 
to prove Theorem [5] As previously declared, Lemma 11 shows that M : K is the result of the 
computation from pWflii" involving all structural reductions: 

Lemma 11 For all M € A®, \\M^K ^* M : K. 

Proof. By induction on M and on the definition of ":" . Some interesting cases: 





V 


K 




LP 


K 




VL 


K 




VW 


K 


L 


© P 


K 


V 


© L 


K 


Vi 


B W 


K 
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• If M is a value, then 

§MJK = (\e.e$(M))K ^ n K$(M) = M : K. 

• If M is L ® R. There are three subcases: 

1. L $ Val. Then 

^Le J RK=(Ae.pKAa.M(A^(((A 7 .7a)©(A7-7^))e))))^ 
^ n pl(Aali?l(A/3.(((A 7 . 7 a) © (A 7 . 7 /3))A))) 
^* L : (Aa.pi|(A/3.(((A 7 . 7 a) © (A 7 . 7 /?))if ))) 
= M:K 

2. L <E Val and i? ^ Val. Then: 

p©i?K-(Ae.pl(Aal^(A/3.(((A 7 . 7 a)©(A 7 . 7 /?))e))))^ 
^ n ^(Aa.M(A/3.(((A 7 . 7 a) © (A 7 . 7 /3))A))) 
^* L : (Aa.M(A/?.(((A 7 . 7 a) © (A 7 . 7 /3))if))) 
= (Aa«(A/3.(((A 7 . 7 a) © (A 7 . 7 /3))iQ))*(L) 
^ n M(A/3.(((A 7 . 7 *(i)) © (A 7 . 7/ 3))A)) 

R : (A/?.(((A 7 . 7 *(i)) © (A 7 . 7 /?))2T)) = M : K 

3. L,Re Val. Then: 

(p © iJH)X- = Ae.pKAa.M(A,5.(ea © e(3)))K 

^ n L : (\a.\\RW(\l3.(Ka®K(3))) 
= (\a.WR^(\f3.(Ka®K(3)))(*(L)) 

^ n R: {\p.{K^{L)®Kp))) = 

(\p.{K^{L)(BKp)))^(R)) 

^ n (KV(L) © KV(R)) = L®R:K. 

This concludes the proof. □ 

As a technical tool, we here need a generalization of the small step semantic relation ■ =>in •, 
that we denote as ■ ^ ■. The new relation is defined inductively as • =>in •, but the first two rules 
are replaced by the two rules 

M ^ {M 1 } 



M EE> 

This means, in particular, that the relation above maps terms to distribution over terms, rather 
than distributions over values. 

Of course, relations • =^in • and • ^> • are strongly related, as shown in the following lemma: 

Lemma 12 If M ®, where S{&) C Val, then M =^ N §). Conversely, if M =>| N ® then 
M ^ @. 

Proof. Simple inductions. □ 

Lemma [13| is an auxiliary tool for the proof of Lemma [l4j In particular, these lemmas extend 
Plotkin's lemma which states that whenever a term M reduces in call by value to a term N, then 
there exists a call-by-name computation from M : K to N : K. In we have to deal with the 
non deterministic nature of our relations t— > v and t— > n (which map terms into sequence of terms), 
and with the fact that the evaluation of a term returns a probability distribution on values and 
not a single values. 
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Lemma 13 If M t— > v N±, . . . ,N n and K is a value, then there are terms P±, . . . ,P n such that 
M : K h+* P u . . . , P n and P t ^* N, : K for every i G {1, . . . , n}. 

Proof. By induction on the structure of M. Some interesting cases. 

• If M = L ® R with L,Re Val, then M L, R and by definition 

M : K — ((A 7 . 7 #(L)) © (A 7 . 7 *(-R)))if 
^ n (X~/.^(L))K,(X^(R))K 
(A 7 . 7 *(L))if ^ n KV(L) = L: K 
{Xj.^(R))K ^ n KV(R) = R:K 

• If M = L® R with L <£ Val, then L ^ v S%,...,S n and iV; = Si ® R. But by induction 
hypothesis, there are terms Q±, . . . , Q n such that 

M:K = L: (Xa.^R^(Xp.(((Xj.ja) © (A 7 . 7 /?))if))) = L : J 

H- n Qi, . . . , Q n 

Qi Si i J 

Notice that, on the other hand: 

Si : J = Si : (Aa.p|(A/3.(((A 7 . 7 a) © (A 7 . 7 /3))if ))) 
Now, if Si is a value, then 

St : J = J*(fli) ^ n ^(A/3.(((A 7 .7*(^)) © (A 7 . 7 /?))if )) 
= 5 s: © i? : if 

If Si is not a value, then Si : J is itself Si® R : K. 

• If M = L ® R with L e Val then J? h^ v Si, . . . , S n and N t — L © 5j. Similar to the previous 
cases. 

This concludes the proof. □ 
Lemma 14 I/M => !V ^ anc! if dosed value, then M : K ^ : K. 

Proof. By induction on the complexity of a derivation for M =>iv @\ 

• If M =hv 0, then clearly M : K ^ : K, simply because : K 



• If M H> v N and iV* ^>| V where 9 = Ya=i tnen > by Lemma 13 M : K H ^ n L\,..., L n 
and Li i— )•* JVj : K. By induction hypothesis, AT, : K ^ £t : K and so we can easily form a 
derivation for M : K X)?=i(n<^ : But clearly, 



: if 



This concludes the proof. □ 

Lemma [15] shows that the call- by- value evaluation of a term M dominates, in term of distribu- 
tions, the call by-name evaluation of the continuation M : Xx.x, up to a suitable translation by 
function 



Lemma 15 If M : Xx.x =h N 9, then M =hv w/iere *(<f) > 0. 

Proof. The proof goes by induction on the complexity of a derivation 7r of M : Xx.x =>in 9- Some 
interesting cases: 
• If M is a value V: 

• if V : I ^>in 0, then £ = itself; 
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• if V : I =h N ® with 9 + 0, then 9 = {V 1 }. 
• If M is not a value: 

• if M : I =Hn the thesis follows trivially; 

• if M : J =^| N with ^ 0, let Ni, . . . N n be such that M n- v N. By Lemma [l3| there 
must be derivations p±,...,p n (all smaller than n) such that pi : Ni : Ax. a; =>in and 
^ = X)"=i By induction hypothesis, iV, =>iv where ^(J^i) > Si and so we can form 
a derivation of M =>iv But clearly, 

n n n 

i—1 i—1 z— 1 

This concludes the proof. □ 

By means of Lemma [TT] Lemma fl4| Lemma [T2| and Lemma [15} it is possible to prove Theorem [5| 

Proof. [Theorem [5] To prove that ty(S\\/(M)) = S\n(^M~^Xx.x), we proceed by showing the 
following: 

1. If M =h v 9, then ^M^Xx.x =>, N <g" and > *(5>)- 

2. If p/flAx.x ^>| N 0, then M ^>| V £ where *(<£") > 0. 

To prove point [I] we observe that if M =hv then both [[M|]if M : if (by Lemma 11 ) and 



M : if =hv ^ : if (by Lemma 14 and Lemma 12). Now, notice that, for every value V 



V : (Xx.x) = (Xx.x)ty(V) ^ v V(V). 
As a consequence, it's clear that 

PHI : Xx.x =^ tN 

Point [2] is nothing more than Lemma |15| This concludes the proof. □ 
7.2 Simulating Call-By-Name by Call-By- Value 



In Section |7.1| we proved the "imperfect" simulation of call- by- value by call-by name strategy. The 
inverse direction is provable in a very similar way. 

Let us define the extended language Ai as in Section 



7.1 



Definition 11 (Call- by- value Translation) The translation map [[ • J] from A ffi to Ai is 
recursively defined as follows: 



[JAx.M]] = Ae.eAx.[[MJJ 
[[MAT]] = Ae.[[MJJ(Aa.a[[ArjJe) 
[|M © ATJJ = Ae.(((Aa.[[MJJa) © (Aa.[[Aj)a))e) 

We also define a function <E>, which sends values into terms as 

$(x) = x(Xy.y) 
$(Ax.M) = Ax.[[MJJ 

Observe that $ sends closed values to closed values. As such, it can be naturally extended to 
distributions of closed values. 

It is possible to state and prove the dual of Theorem [5] 

Theorem 6 (Simulation) For every M, $(«S| N (M)) = S| V ([[M]](Ax.x)). 

To prove Theorem [6] we need some technical lemmas, as in Section [7Tj We retrace the same proof 
techniques. 
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Lemma 16 [[MJJ{[[JVJJ/a;} = []M{iV/x}]]. 

Proof. By induction on the size of M. We give here only the sum case. See [] for other 
cases. 

• If M = L R then: 

ILL RRm/x} d i f Ae.(((Aa.||Lj|a) ® (Aa.[LiiJ|a))e){lLJVJ|/ a: } 

= Ae.(((Aa.[LLJ|{lL^JIM«) © (A«.W{M/4«))^) 
^- Ae.(((Aa.|[L{|[iVJJ/a;}JJa) ® (Aa.[Li2J|{lL^MJ|a))c) 
d ^ f [L(L©i?){lLiVj|/a;}J| 

This concludes the proof. □ 
Now the infix operator is defined as: 

V : K = K<f>(V) 
LP :K = L: {Xa.a^P^K) if L £ Val 

: if = $(F)[[LJJX if y e Val and L ^ Val 
L®P:K = ((Aa.[|Lj|a) ffi (Aa.[[PJJa))LT 

Lemma 17 for o22 MeA e , [[MJJX M : K . 

Proof. By induction on the term M. We give here only the sum case. See [] for other cases. 

[|L PJX = Xe.{{(Xa.\\_L]\a) ® (Aa.[LPJ|a))e)A: 
^ v (((Aa.[LLJ|o)e(Aa.[LPJ|a))A-) 

^ L®P:K 

□ 

Lemma 18 // M h- S- n N\,. . . , N n and K is a value, then there are terms Pi, . . . , P n such that 
M : K i-h, Pi, . . . , P n and Pi M>* iV; : K for every i £ {1, . . . , n}. 

As a technical tool, we here need a generalization of the small step semantic relation • =>in ■, that 
we denote as • -» •, which is defined inductively, but in which the first two rules are replaced by 
the two rules 



M -» {M 1 } 



M -» 

This means, in particular, that the relation above maps terms to distribution over terms, rather 
than distributions over values. 

Lemma 19 // M -» 3, where S{3) C Val, then M =>| V 9). Conversely, if M =>| V 9> then 
M -» 3. 

Proof. Simple inductions. □ 
Lemma 20 If M =>\ N 3, then M : K -» 3 : K . 



29 



Proof. By induction on the complexity of a derivation for M =Hn 
• If M =>| N 0, then clearly M : K -» : K, simply because : K - 



If M h-> n N and N =hv where 9 = J27=i n^> then ' h V Lemma 18 , M : K i-» v Zi, . . . , L n 
and i— )•* JVj : K. By induction hypothesis, Ni : K -» Si : K andso we can easily form a 
derivation for M : K -» Yli=i(n^* ' ^) c l ear ly: 



ta*-)-(t^)- 



This concludes the proof. □ 
Lemma 21 If M : Xx.x ^>, v 9, then M =>| N w/iere $(<?) > ^. 

Proof. The proof goes by induction on the complexity of the derivation ir of M : Xx.x =>\n 2> . 
Some interesting cases: 

• If M is a value V: 

• if V : I =h N 0, then = itself; 

• if V : I =h N ^ with ^ ^ 0, then = jV 1 }- 

• If M is not a value: 

• if M : I =>in the thesis follows trivially; 

• if M : I =Hn ^ wi th 9 ^ 0, let JVi, . . . N n be such that M i-> v N. By Lemma [l3j there 



must be derivations pi,...,p n (all smaller than n) such that pi : JVj : A:r.a; =>in $i and 
^ = X)"=i -By induction hypothesis, JVj =*-iv where 3>(<^j) > Si and so we can form 
a derivation of M =>iv But clearly, 

n n n 

2 — 1 2— 1 2 — 1 

This concludes the proof. □ 



Proof. [Theorem [6] To prove that <&(<S|n(M)) = <S|v([[M~[]Aa;.a;), we proceed by showing the 
following: 

1. If M =^| V 0, then ^MjXx.x => m S and S > *(S>). 

2. If pfflAx.x =Hn 0, then M ^>, v § where $(<T) > 9. 

To prove point [I] we observe that if M =>| V S), then both [[Mfji^ h->* M : K (by Lemma 17) and 



M : if =>iv 2> ■ K (by Lemma 20 and Lemma 12). Now, notice that, for every value V 



V : (Xx.x) = (\x.x)$(V) ^ v $(V). 
As a consequence, it's clear that 

PHI : ^ =>IN 

Point [2] is nothing more than Lemma [2l] This concludes the proof. □ 

8 On the Expressive Power of A 

The lambda calculus Aq is endowed with a very restricted form of (probabilistic) choice, which is 
binary and such that both possible outcomes have probability 1/2. It is thus natural to ask oneself 
whether this is a essential restriction or not. In this section, we show that this is not essential, by 
proving that the set of representable distributions on the natural numbers, namely those which can 
be denoted by a term of A® equals the set of computable distributions, defined in terms of Turing 
Machines. In this section, we assume to work with call-by-value reduction, even if everything could 



be rephrased in call- by-name, in view of the results in Section 7.1 
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Wc first of all need to define what a computable distribution (on the natural numbers) is. This 
is based on a notion of approximation for the function assigning probabilities to natural numbers. 

In we are able to represent (up to approximation) probability functions from a suitable 
domain to the real interval K[o,ii- The notion of approximating function is necessary: 

Definition 12 (Approximating function) Given any function f : Q —> Rro i], the approximat- 
ing function for f , denoted Apprj is the function from Q x N to {0, 1}* which on input (a,ri), 
returns the binary string of length n containing the first n digits of f(a) in binary notation. 

We define now the class of computable functions: 

Definition 13 (Computable Distributions) A distribution T> : N —> K[o,il * s computable iff 
the function Appr v : N x N — > {0, 1}* is computable. 

In the rest of the section we will use the following notation: 

Notation 2 Given a proper distribution S> that assigns nonzero probability only to representation 
of natural numbers, the corresponding probability distribution over the naturals will be denoted with 
{&}. 

The next step consists in understanding which class of distributions on the natural numbers 
can be captured by Aq, i.e. is the semantics of a lambda term. For this reason, we assume to work 
with a fixed encoding of the natural numbers into lambda terms, namely the one usually attributed 
to Scott ng. 

Definition 14 (Representable Distribution) A probability distribution T> over the natural 
numbers is said to be representable iff there is a lambda term Mx> such that {S\\/(M)} = T> . 

Our main goal will be to prove that the class of representable distributions will coincide with the 
class of computable distributions: on one hand each distribution obtained by small step evaluation 
of a lambda term in A e is proved to be computable (Soundness Theorem[7]); on the other hand, 
each computable distribution can be represented by a term in A^ (Completeness Theorem [8J. 

Whatever can be denoted by a lambda term is actually a computable distribution, i.e. one 
which can be approximated up to any degree of precision by a Turing Machine: 

Theorem 7 (Soundness) Every representable distribution is computable. 

Proof. Suppose I? is a representable probability distribution. Then there is a term M such that 
{<Siv(M)} = T>. This implies, in particular, that J^<Siv(M) = 1, because ^2"D = 1 itself. An 
algorithm computing Appr v , then can be easily designed as an evaluator for M: on input (a, n), 
simply compute distributions £> such that M =$>\y St, until you find one such that ^ ^ is big 
enough as to be able the determine the probability of all values up to the n-th bit. □ 

The completeness theorem requires other definitions and some auxiliary results. To prove expres- 
siveness results, we exploit standard lambda calculus encodings. Natural numbers and binary 
strings can be encoded as follows: 





= Xxy.x 


<~n+ l n 


= Xxy.y r n n 


r e n 


= Xxyz.x 


r • s n 


= Xxyz.y r s~^ 


r l • s n 


= Xxyz.z r s n 



Moreover, we encode pairs of values as follows: 

(V, W) = Xx.xVW 
We define the class of the so-called finite distribution terms: 
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Definition 15 (Finite distribution terms) Finite distribution terms are lambda-terms gener- 
ated inductively as follows: 

• For every n, Xxy.x r n^ is a finite distribution term; 

• If M and N are finite distribution terms, then Xxy.yMN is a finite distribution term. 

Given a finite distribution term M, the underlying probability distribution on the natural numbers 
{M} is defined in the natural way: 

{Xxy.x r n n } — {n 1 } 

{Xxy.yMN} = \{M} +\{N}. 

Lemma [22] is an intermediate result towards Lemma l23l 

Lemma 22 (Fixed-point Combinator) There is a term H such that for every value V , HV 
rewrites deterministically to V(Xx.HVx). 

Proof. The term H is simply WW where 

W = Xx.Xy.y(Xz.xxyz). 

Indeed: 

HV = WWV 

^ v (Xy.y{Xz.WWyz))V 
^ v V(Xz.WWVz) 
= V(Xz.(HV)z). 

This concludes the proof. □ 

Lemma 23 There is a lambda term Mfdt such that for every finite distribution term N , {S\\/(MfdtN)} 
{N}. 

Proof. The term M« t is simply HV where V is 

Xxy.y(Xz.z)(Xzw.(xz) © (xw)) 

Indeed, by induction (X is Xx.HVx): 

M fdt (Xxy.x r n' 1 ) (Xxy.x r n n )(Xz.z)(Xzw.(Xz) (Xw)) 

i (Az.z) r n n h^ v r n n ; 
M fdt (Xxy.yMN) ^ VX (Xxy.yMN) 

h^ v (Xzw.Xz 8 Xw)MN 
h->* (XM) 8 (XN) 

and (XM) © (XN), applying the i.h. on the subterms, evaluates with equiprobable distribution to 
either {M}, {N}. This concludes the proof. □ 

The following proposition guarantees the existence of a sort of successive approximations 
function, which computes the distribution of a given term in a finite number of steps. 

Proposition 7 (Splitting) There is a lambda term M sp u t such that for every term N computing 
the distribution T>, M sp u t N rewrites deterministically to (L, P) where 

• L is a finite distribution term such that {L} — £ ; 

• P computes a distribution T: 

• v=\s + \r. 
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Proof. Only informally, observe that it is possible to find the (sub)distribution 8 in a finite 
numbers of steps, by successively querying N (which takes two natural numbers as arguments) in 
the dovetail order. This way, the (sub)distribution £ can always be determined. □ 

By means of Lemma 23 and Proposition [7] it is possible to prove the following completeness 
result: 

Theorem 8 (Completeness) Every computable probability distribution T> over the natural num- 
bers is representable by a term Mp. 

Proof. Define a term M simply as HV where V is 

Xxy.(M sp i lt y)(Xzw .[Xs.(Mf d tz) ® Xs.(xw)](Xs.s)) 

We will prove the following statement: for any n € N e for all N € A® which computes a 
distribution <f , there exists a distribution £1 such that MN =>iv 3> with ^I'S > (1 — ^-) and 
{S>} < <§ . We will prove the thesis by induction on the natural number n. 

• If n = then 9 = and MN =h v trivially. 

• If n > observe that MJV rewrites as follows: 



MN = HVN i v * V(Ai.(i2V)i)iV 

' (M split N) (Xzw.((((Xs.M fdt z) © (Aa.(At.flVt)u;))(As.«)))) 
s * " 

i?. 

Observe that, by Proposition [7j M sp u t N rewrites deterministically to the pair (L.P), where L 
is a finite distribution term such that {L} = Si, and P is a finite distribution term such that 
{P} = S P and S = \S L + \S P . Then: 

(Xx.xLP)R h> {Xzw.({{(Xs.M fdt z) © (Xs.(Xt.HVt)w))(Xs.s))))LP 
^ 2 {{Xs.M fdt L) © {Xs.(Xt.HVt)P))(Xs.s) 
i — ^ (Xs.M fdt L){Xs.s), (As.(At.-fm)P)(As.s). 

Now observe that, (Xs.MfdtL)(Xs.s) t-> Mf^L and by Lemma [23] this term evaluates to 
{L} = S L . Moreover, (Xs.(Xt.HVt)P){Xs.s) ^ 2 {HV)P = MP and we can apply to P the 
induction hypothesis. Since P compute a distribution <§p by induction hypothesis for each 
natural number n, there exist a distribution Sip such that MP =4>iv Sip with 2lp > (1 — ^) 
and {^ P } < ^p. 

Let us take the distribution 2 as 9 = + \®p- Then: 

- 2 2 v 2" ; 
11 1 1 

2 + 2 ~ 2' i+1 ~~ ~~ 2 n+r 
Moreover, & < S. The thesis follows easily. 

□ 



9 Conclusions and Future Work 

In this paper we studied probabilistic operational semantics for A®, a nondctcrministic extension 
of untyped lambda calculus. We prove strong equivalence results between small-step and big-step 
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semantics, both in call-by-value and in call-by-name. We also extend Plotkin's simulation to our 
probabilistic setting and we state and prove some results about the expressive power of A ffi . 

Starting from the present paper, several directions for future work are open. On the one hand, 
some theoretical aspects of the calculus remain unexplored: for example, it should be an interesting 
topic to develop an observational theory for terms. Moreover, the equivalence between inductive 
and co-inductive semantics seem to be reminiscent of equality between outer measure and inner 
measure in measure theory: it should be interesting to prove some results in this direction. On 
the other hand, it is possible to consider as a paradigmatic language for stochastic functional 
programming: it should be fascinating to analyze carefully the relationship between A^ and other 
probabilistic languages (for example, Park's language Jz? 7 [H]). 
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